An example of how optimizing for short-term rewards can weaken

from 2023-02-14 Organize the top page leads

@tsukammo: I'm having trouble explaining why Life Optimization doesn't work, game tree search.

https://gyazo.com/597878edc889a3c2489d01be73177041

@tsukammo: This is what happens with an evaluation function based on direct rewards alone, so a common " lifehacks" are optimizing the evaluation function with "curiosity" or "prepare a reward by chopping in small steps".

Yeah, I know all that. I just don't.

Trade-offs between use and exploration

---

This page is auto-translated from /nishio/短期的報酬に最適化すると弱くなる例 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.